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Abstract - Two new parallel optimization algorithms based on the simplex method 
are described. They may be executed by a SIMD parallel processor architec- 
ture and be implemented in VLSI design. Several VLSI design implementations 
are introduced. An application example is reported to demonstrate that the 
algorithms are effective. 


1 Introduction 

Optimal system control is an important part of modern control theory. The kernel problem 
is optimizing the behavior of systems, as in minimizing the energy or cost required to 
accurately reach some required terminal state. The search for the control which attains the 
desired objective while minimizing (or maximizing) a defined system criterion constitutes 
the fundamental problem of optimal control [1] [2] [3] . 

To date, practical applications of optimal control theory are still quite few in number. 
For a class of systems with fast response, the implementation of a real-time on-line optimal 
controller has been difficult. The time-consuming computation required for optimal con- 
trol solutions has been a major obstacle. Modern supercomputers with parallel processing 
architectures and very fast computation speed are not a practical solution because of their 
weight, size and cost. Fast computation, small size and low cost are basic requirements 
for the controller. In this paper, the technique of an algorithmically specialized computer 
is suggested to achieve an optimal controller which can realize both real-time computa- 
tion and on-line control for a rapidly responding system. Effective algorithms, parallel 
architecture, and VLSI implementation are involved in the design of the controller. 

Efficient optimization algorithms are very necessary for solving the two-point boundary- 
value (TPBV) problems which arise in optimal control. Chazan and Miranker in 1970 [4] 
originally proposed a nongradient-based parallel search algorithm for unconstrained mini- 
mization which is suitable for execution using an array of parallel processors. The algorithm 
involves the parallel execution of n linear searches along the same direction, starting from 
n points, when the dimension of the vector of unknowns is n. Travassos and Kaufman [5] 
have applied the algorithm to the solutions of optimal control systems. Housos and Wing 
in 1984 [6] reported a parallel pseudo-conjugate direction algorithm that performs a set of 
n linear searches in parallel along different search directions. Those parallel optimization 
algorithms proceed by univariate optimization so that they are MIMD-type algorithms 
[7]. Although they may be used to solve the optimal control problem, it is not easy to 
shift them to VLSI design for a small size and low cost controller. Two new parallel, 
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nongradient-based algorithms for unconstrained optimization are presented in this paper. 
In contrast to existing parallel optimization algorithms, the new parallel algorithms are 
based on a simplex method and are SIMD-type algorithms [7]. The advantage of the new 
algorithms is that they do not need a linear search and may be easily shifted to VLSI 
implementation. 

Three kinds of design schemes: digitally controlled analog, hybrid, and pure digital, are 
presented in this paper. Their VLSI implementation and their performances are discussed. 

2 New parallel optimization algorithms 

The following unconstrained minimization problem is considered: 


min /(X), X e 3 R rt , 

where / : — ► 3?, and is usually non-quadratic and nonlinear. 

We wish to find a point X* numerically such that, if e > 0, then 

f (X*) < /(X), for all X : ||X - X*|| < e. 

Two parallel simplex algorithms, PSl and PS2, which are based on an improved simplex 
method [8] [9] and use parallel function evaluations, are stated below. 

Algorithm PSl: The algorithm PSl predicts four candidate vertices simultaneously 
in one iteration. Therefore at least four parallel processors are required. Each iteration 
includes two phases: the first is for parallel evaluations and the second is for choosing a new 
vertex to generate a new simplex via function value comparison. The computation time for 
the function evaluations is always longer than the time for the function value comparison. 
The execution of parallel function evaluations effectively reduces computation time since 
it is a major part of the time for one iteration cycle. It is also important that the parallel 
function evaluations are of the SIMD type. This allows the algorithm to proceed in the 
SIMD parallel architecture. The number of parallel function evaluations required by PSl 
is only about half the number required by the improved simplex algorithm of Nelder and 
Mead [8]. 

The algorithm PSl is described below: 

(0) Initial simplex: 

(0a) Set the iteration number k — 0. 

(Ob) Starting point v° = (x^x®, ' ' ' > I n) * s gi ven - An initial simplex 
V° = • • , v° +1 ] is formed in parallel by: v° = (1 — 6)u° 

o f *° + SEiZ*, if *° ± 0 

v *+i ( u° -(- 6Ei, otherwise 

where E{ = {0 . . . 0 1 0 . . . 0 }, i = 1,2, ... ,n, and 6 = 0.1 
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(Oc) 


Parallel evaluation of the function value at the vertices of V 0 

Y° = [/K), /(»$),• -,/M+i)]- 


(1) Parallel sorting: 

Set k = k + 1. Let S k = [XI X k , • ■ • , X‘ +x ] and F k = [f k , /£,■•■, /* +1 ] be ordered 
V k ~ l and Y k -\ 


Find d,d = max(||A? - X*||),i = 2,3,...,n + 1, if d is small enough, then stop, 
otherwise continue as follows. 


Denote 

Xi by X k , X,h by X k , Xh by X k +1 , 
fi by f k , f.h by f k , fh by f k +1 - 

The centroid X is the mean of the vertices with i n + 1, i.e., 


X = 


i E x . fc 

n i= i 


(2) Parallel computation: 

X c = (1 - /3)X + (3X h 
X a = (l+(3)X -(3X h 
X T — (1 + c*)X oXh 
X t = (1 + j)X - ~fX h 

Parallel function evaluations 

fc = f(X c ), f a = f(X a ), fr = f(X T ) and U = /(*<) 

(3) Comparison and selection of new point for updating simplex: 

(3a) If f e < f r < fi, then X h = X e and f h = f e - 
(3b) If f,h > fr > f\ or /«>/<■> /i> then X h = X T and fh = fr- 
(3c) If f h > f r > f.h and f a < f.h, then X h = X a and f h = f a - 
(3d) If f T > fh and f c < f.h , then X h = X c and f h — fc- 

(3e) If f r > fh and f c > f, h or if f h >fr> f.h and f a > f. h , do shrinkage in parallel: 
X, = [Xx,X,i\, where X. } = (X, + XJ/2, j = 2, 3, . . . , n + 1, evaluate and 
update F k = [/i, f{X tl ), f(X Si ),--*, f{X . n+l )] and S k = X., then do 

(4) Update the simplex: 

let V k = S k , Y k = F k , then return to (1). 
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Algorithm PS2: The algorithm PS2 is developed from the algorithm PS1 by increas- 
ing the parallel processors to sixteen. Twenty processors in total are utilized to predict 
twenty candidate vertices simultaneously in one iteration. The algorithm PS2 if more effec- 
tive than the algorithm PSl. One iteration of the algorithm PS2 is functionally equivalent 
to two iterations of the algorithm PSl. Thus the algorithm PS2 will do the same function 
in roughly half the time of the algorithm PSl. Algorithm PS2 is also of the SIMD-type. 

The algorithm PS2 is described below: 


(0) The same as Step (0) of Algorithm PSl; 

(1) The same as Step (1) of Algorithm PSl; 

(2) Parallel computation: 


(2a) Compute the first level direction points (four in total) in parallel: 

X c = (l-/?)X+/?X h 

X a = (l + 0)X-0X h 

X r = (1 + a)X - aX h 

X e = (l+ 7 )X- 7 X h 

and find 4 conductive points in parallel 

Xi = i(£ Jr* Xi + X,), > =V, V, V, V, 

(2b) Compute the second level direction points (sixteen in total) in parallel: 

X ie = (1 - 0)Xi + flX,h 
Xu = {l+0)X i -pX, h 
X iT = (1 + a)Xi - aX,h 
Xie = (1 + l)Xi - ~fX,h 
where i =V, ’a’, V, V, 

(2c) Parallel function evaluations 

it = f(x<) 

fi, = f(X ic ), /i. = /(X,„), k = f{X„) and /« = f(X„) 
where i =’c’, ’a’, V’, ’e’, 

(3) Comparison and updating simplex: 

Set m = 0. 


(3a) If /,</,< fi, j ”V, or 

if f.h > fr > fi or fe> fr < fp, j =V, or 
if fh> ft > f.h or fa < f,h, j =v, or 


if /, > fh and f c < f,h, j =V, set m = m + 1 and do (3b) 

if f T > f h and f c > f, h , or if fh > fr > f.h and f a > f th , do shrinkage in 
parallel, X, = {X\,X tj }, where = ( Xj + X\)/2, j = 2, 3, ■ • • , n + 1, evaluate 
F h = {f u f(X, 3 ), f(X ti ),■••, /(X,„ +I )} and let = X., then do (4). 
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(3b) Replacing: 

X h = X.h, fh = U, x, h = Xj, f, h = /,, and X, = X j{ , /, = /ji- 
If m = 1 do (3a), otherwise do (4). 

(4) The same as the step (4) of the algorithm PS1. 

3 Example of application to real-time optimal control 

The air-to-air missile-target intercept is a practical real-time optimal control application. 
A typical intercept mission from missile launch to intercept, may take only a few seconds. 
It is almost impossible to achieve true optimal control during such a short time interval 
with present technology. The efficiency of the algorithm PS2 for real- time application of 
optimal control is demonstrated in this section via simulation of a 3-dimensional air-to-air 
missile-target intercept problem. An optimal guidance law that minimizes missile energy 
expenditure with fixed final time tf and fixed final state (zero miss range) is derived m 
Ref [9] using nonlinear optimal control theory. This section focuses on solving^ the non- 
linear TPBV problem (NTPBVP) which arises in the intercept problem by the “shooting 
method” using Algorithm PS2. 



Figure 1 shows the 3-dimensional intercept scenario. The target T moves in a straight 
line at constant velocity ut and the missile M moves at controlled acceleration a(t) and its 
direction angles are a(t ) and /3(f)- An on-board optimal controller in the missile calculates 
and provides, a(t), a(t) and /3(f) to the missile thruster. 

The NTPBVP obtained is 
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where 


*1 

= 

*4 
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( 1 ) 


Notice that the initial conditions x 10 to x 6 o aye constant and the terminal values X \ (</) 
to * 3 (</) and xxq(</) to Xi 2 (</) are zeros. The first six equations of (1) are the dynamics 
of the system. The second six equations are the co-state equations. The shooting method 
starts with estimating a set of initial values (x 7 ( 0 )x 8 ( 0 )x 9 ( 0 ))*\ then integrates (1) forward, 
with given and estimated initial values xx(0) to X| 3 (0). The resulting terminal values are 
usually different from the given ones. An error function E is defined by 


E — \J Xi(</) 2 + X2(t/) 2 + X3(</) 3 + X l0 (</) 3 + Xu (</) 2 + X l2 (t/) 2 
The shooting method attempts to minimize the error function E: 


■ ( 2 ) 


min E — > 0 


(3) 


This can be done by means of the algorithm PS2 to update the estimated initial values 
until (3) is satisfied. 

The initially given condition is 1 


*10 = 

20000 

(ft) 

*20 = 

3000 

(ft) 

*30 = 

2500 

(ft) 

*40 = 

-972 

(ft /sec) 

*50 = 

-972 

(ft/sec) 

*60 ~ 

0 

(ft/sec) 


and the fixed final time is if = 5(sec). 

Assume the target velocity vx is constant, the travel path of the target will be a straight 
line. The target path may be calculated correctly by 


1 Data taken from Ref. [10] 


3rd NASA Symposium on VLSI Design 1991 


11.2.7 


= Xq T “H V T i 
yr{i) = Vot 
zr(t) = zqt 


where [x 0 r Vot z ot] T is the target’s initial position so that open-loop optimal control 
may be employed by the missile. 

To come up with open-loop optimal control numerically, one must first solve the NTP- 
BVP (1). For a set of rough initial estimations 


X70 = 1 

^80 = 1 
£90 — 1 


.using the algorithm PS2, the resulting solutions are in Table 1: 


TI (sec) 

OIV 

PFE 

CMR (ft) 

RMR (ft) 

[0 5] 

x 7 (0)=0. 65993 
x 8 (0)=-0.08107 
x 9 (0)=0. 92175 

114 

0 

1.287e-9 


Table 1: The numerical results of the intercept scenario 


TI-Time interval, 

OlV-Optimal initial values, 
PFE-Parallel function evaluations, 
CMR-Constraint of miss range, 

RMR-Real miss range. 


As a rough estimation of computation time, if the PFE < 120 in five seconds as shown 
in Table 1, then the real-time optimal control can be implemented for this air-to-air missile- 
target intercept problem. This is very possible with modern VLSI techniques. In the next 
section several VLSI design possibilities are introduced. 


4 VLSI implementations 

This section presents design possibilities for potential real-time, on-line, optimal con- 
trollers. These optimal controllers will be algorithmically specialized parallel computers 
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consisting of a few VLSI chips. Small special-purpose optimal controllers should be useful 
for certain optimal control systems, such as aircraft control, missile guidance, etc. 

To conserve space, only the algorithm PS1 is considered for VLSI implementation in 
this section. The design procedure can be used for algorithm PS2, but the resulting circuit 
will be more complex. 

4.1 Schematic design 

A schematic diagram of the implementation of the algorithm PSl is shown in Figure 2. 
The dashed box performs the main function of the algorithm PSl. In order to be useful for 
various control systems, a parallel function evaluator (PFE) is separated from the dashed 
box. The PFE is an array of four 2 parallel processors. The complete system includes two 
separate parts: the main algorithm part and the PFE. The main part is the algorithm 
itself in which the design is fixed. The PFE is more flexible and is different from system 
to system. 



Figure 2: Block diagram of the algorithm PSl 

The operation of the system outlined in Figure 2 may be described as follows. 

The IS, connected to the input Xo, is for the generation of an initial simplex [Xoi • ■ • 
-^o(n+i)]- Via the multiplexer (Mul), the function values on the initial simplex may be 
evaluated by the external PFE. The outputs of the PFE, [f(Xi), ■ ■ • , /(X n+1 )], via the 

J Twenty for the algorithm PS2. 
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demultiplexer (DMul) and a set of updating switches (USw), are saved in the simplex 
memories (SMe). The Mul and the DMul also pass the initial simplex vertexes, denoted 
as [Xi •••X n+ i], to the SMe. Then a basic simplex with its function values is stored for 
further operations. 

According to the algorithm PS1, the stored simplex must be updated. To do this, the 
simplex in the SMe must be first sorted by a sorting circuit (Sorting). A sorted simplex, 
with function values [/i, • • • , f.h, fh], is available at the output of the 
Sorting. From it four direction points, X c , X a , X r and X e , can be found in the direction 
points module (DP). Similar to the initial simplex, they and their function values, f r 

and / e , evaluated by the PFE are stored in the direction memories (DMe) via the Mul and 
the DMul. A new point module (NP) compares [f c , f a , f T and f e ] with [/j, f,h and fh] and 
then selects a proper one of the direction points, denoted by X ' h , with its function value 
f h . Via the USw the X £ replaces the vertex Xh to update the basic simplex. The positions 
of Xh and fh are indexed by one of the signals g x to g n+ \ generated by the Sorting. 

In case no new point can be selected, the NP will send out a digital signal Ds. Through 
it the DTC generates another control signal Cs to the Mul and the DMul, then a shrunken 
simplex from the simplex shrinkage (SSh), [Xi,X,i, • • • ,X, n ], with its function values is 
passed to the SMe, so that the basic simplex is updated. 

The simplex size module (SSi) and the convergence testing module (CT) monitor the 
size of the sorted simplex and its minimal function value. Together with the size switches 
(SSw) and the Sorting, when one of them satisfies a given criterion, the CT will send a 
“stop” signal to finish the iterations. 

The digital timing controller (DTC) is necessary to control the timing of the whole 
system. The functions of the DTC may be stated by defining its inputs and outputs as 
follows: 


Inputs: 

Start: 

T: 

Repeat: 

Stop: 

Ds: 

Outputs: 

Ce: 

Ci: 

Cs: 

Cd: 

Cr: 

Cm: 

Ct: 


actuates the DTC and starts the computation, 

a parameter given for setting up the width of the Ce’s active 

interval, 

after the computation, reactuates the DCT and repeats the 
solutions if necessary, 

stops the iteration when the solutions are available, 
active when shrinkage simplex is needed, sets up the Cs, 


actuates the PFE, the Mul and the DMul, its active length is given 
by the input T, 

passes the initial simplex and its function values to the SMe via the 
Mul and the DMul, 

passes the shrinkage simplex and its function values to the SMe via 
the Mul and the DMul, it is controlled by the input Ds, 
passes the direction points and their function values to the DMe 
via the Mul and the DMul, 

repeats the computation, it is controlled by the input ” Repeat”, 

actuates the SMe and the DMe, 

tests the simplex size, active at each iteration. 


The design of the DTC is a normal digital logic design and is not included here. 
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To meet various application requirements, three kinds of design schemes (1) digitally 
controlled analog, (2) hybrid, and (3) all-digital, are suggested here. The design of digitally 
controlled analog is due to analog computation on both the PFE and the main algorithm 
part. The hybrid design uses digital computation for the main algorithm. Finally the 
digital design is a pure digital scheme. Due to their different characteristics, they are 
employed in different circumstances as listed below. 

For the digitally controlled analog controller: 

1. Accuracy limited but faster computation. 

2. Limited memory period, 

3. More efficient for low frequency systems with shorter operation time. 

For the hybrid controller: 

1. Same as 1 above. 

2. Unlimited memory period. 

3. More efficient for low frequency systems with longer operation time. 

For the digital controller: 

1. Accuracy unlimited but slow computation. 

2. Unlimited memory period. 

3. No strong relation to frequency and time of system operation. 

4.2 Digitally controlled analog scheme 

In general, analog computation is faster than digital computation. This suggests the PFE 
and the main algorithm part (not including the DTC) may be implemented by analog 
techniques. However analog long-time memory is not easily implemented on a VLSI chip. 
Memory time is strongly depended on the problem’s complexity. If the requirement for 
memory time is too long and the size requirement is critical, the hybrid computation 
scheme should be considered as below. 


4.3 Hybrid scheme 

The hybrid scheme includes digital computation and memories in the main algorithm part. 
But the PFE still uses analog techniques. In practice, the PFE is a parallel electronic 
differential analyzer (EDA) which consists of some integrators. Integration computations 
is more convenient with analog circuiting than with digital. Keeping the PFE in analog 
will reduce computation time. A hybrid scheme is suggested in Figure 3. The system has 
three sections: the PFE, the digital algorithm processor and the linkage system, in which 
some an alog/ digit al (A/D) and digital /analog (D/A) converters are essential. 

Each numerical value in the digitally controlled analog scheme mentioned above, such 
as each function value, each element value of a simplex vertex and each constant value, 
will be described in m-bit form and be stored in a m-bit register. In order to achieve 
parallel computation, the input and output to these registers are parallel m-bit data buses. 
Furthermore, all of the digital devices in this system are parallel. 
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Figure 3: Hybrid scheme 


4.4 Digital scheme 

Based on the hybrid scheme, a pure digital optimal controller maybe obtained by designing 
a digital PFE. A key point is to design, for the PFE, a digital integrator, which is very 
different from an analog one. The design of the digital PFE is related to both the solution 
methods and the particular problem, and may be separated into two parts. The first 
one is an algorithmically specialized unit of a solving algorithm, such as the Runge-Kutta 
algorithm, and another is a computing unit of a given differential equation. 

5 Conclusion 

Two new par all el optimization algorithms, PS1 and PS2, based on the simplex method 
are described. Four processors are required for PS1 and twenty for PS2. They may be 
executed by a SIMD parallel processor architecture and may be easily shifted to VLSI 
design. 

The numerical result of a 3-dimensional air-to-air missile-target intercept problem has 
been reported to demonstrate that the algorithms are effective and the real-time optimal 
controllers are feasible for a class of optimal control systems with fast response. 

As a design example, the algorithm PS1 has been shifted to a VLSI implementations. 
Three types of controller design schemes have been presented: (1) digitally controlled 
analog, (2) hybrid, and (3) pure digital controller. They can be employed satisfactorily for 
different application requirements. 

In general, the optimal controllers converge rather rapidly, once the estimation of an 
initial value is found such that the evaluation of the error function E being minimized re- 
sults in a number in the neighborhood of zero. However, if the problem to be solved is very 
sensitive to small perturbations in the initial co-state vector, convergence to an optimal 
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solution may be slow, or even fail. This case was not considered in this research. To over- 
come this problem a method [5] suggested by R. Travassos and H. Kaufman may be added 
in the design of the optimal controllers. This approach is currently under consideration. 
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